Abstract
This study explores US House election results from 1976 to 2022, focusing on US House of Representative trends from 2016-2022, and state-wide trends in Arizona from 2012-2022. From 2016-2020, the data was wrangled into three subsets, one for each cycle. To simplify the analysis, all parties that are outside of Republican and Democrat were grouped into a new generalized party called “Other”. These subsets were then displayed on a US map, where the fill of the state indicates which party has the house majority, while the label of the state indicates which party had the popular vote majority. This revealed insights into how the electoral system can lead to politicians that may not represent the sentiment of the population of the state. Zooming in on the state of Arizona, voting trends were analyzed over a longer time frame of 10 years, from 2012 to 2022. The mode of voting as well as the change in party in each district was visualized to create insights into how mode of voting will influence the winning candidate, and how that influences the winning party in each district.
Insights into voting trends are very important metrics for politicians and the general population as it can be directly be related to other aspects of life such as the economy and global politics. These insights can then help politicians target states where voter sentiment can make or break a politicians campaign.
The analysis focused on time-series analysis as the variables were view over time and the changes noted as valuable insights. The limitations for this project include the assumption that the conglomeration of minor parties will create a larger third party that will vote different from Democrat and Republican, though this is not the case in real life, where some minor parties will align closer to the larger parties and some will votes on their own ideas. Ulitmately, This study provides insights into US voting patterns and the impact of election results on future voting trends.
Introduction
Introducing the Dataset
The dataset, US House Election Results is sourced from MIT Election Data and Science Lab (MEDSL), offers a comprehensive overview of US House elections.
This dataset contains observations for elections held over 47 years from 1976 to 2022, encompassing a total of 32,452 recorded events. Each event is represented as a row with 20 attributes as columns. These columns provide details including the year, state, district, political party, candidate’s name, votes received, and various indicators such as whether it was a runoff election or if it was a write-in candidate.
EDA
# A tibble: 32,452 × 21
year state state_po state_fips state_cen state_ic office district stage
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
2 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
3 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
4 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
5 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
6 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
7 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
8 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
9 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
10 1976 ALABAMA AL 1 63 41 US HOUSE 4 GEN
# ℹ 32,442 more rows
# ℹ 12 more variables: runoff <dbl>, special <dbl>, candidate <chr>,
# party <chr>, writein <dbl>, mode <chr>, candidatevotes <dbl>,
# totalvotes <dbl>, unofficial <dbl>, version <dbl>, fusion_ticket <dbl>,
# State_Population <chr>
Summary Statistics for year :
Min. 1st Qu. Median Mean 3rd Qu. Max.
1976 1988 2000 2000 2012 2022
Summary Statistics for state :
Length Class Mode
32452 character character
Summary Statistics for state_po :
Length Class Mode
32452 character character
Summary Statistics for state_fips :
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 17.00 31.00 28.76 40.00 56.00
Summary Statistics for state_cen :
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.00 23.00 52.00 50.95 74.00 95.00
Summary Statistics for state_ic :
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 14.00 40.00 37.09 52.00 82.00
Summary Statistics for office :
Length Class Mode
32452 character character
Summary Statistics for district :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 3.000 6.000 9.848 13.000 53.000
Summary Statistics for stage :
Length Class Mode
32452 character character
Summary Statistics for runoff :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000000 0.0002465 0.0000000 1.0000000
Summary Statistics for special :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000000 0.000000 0.000000 0.002773 0.000000 1.000000
Summary Statistics for candidate :
Length Class Mode
32452 character character
Summary Statistics for party :
Length Class Mode
32452 character character
Summary Statistics for writein :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.08412 0.00000 1.00000
Summary Statistics for mode :
Length Class Mode
32452 character character
Summary Statistics for candidatevotes :
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1 4324 57328 66825 112144 1165136
Summary Statistics for totalvotes :
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1 162266 206983 215165 263386 2656104
Summary Statistics for unofficial :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000000 0.000000 0.000000 0.001202 0.000000 1.000000
Summary Statistics for version :
Min. 1st Qu. Median Mean 3rd Qu. Max.
20230706 20230706 20230706 20230706 20230706 20230706
Summary Statistics for fusion_ticket :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.08135 0.00000 1.00000
Summary Statistics for State_Population :
Length Class Mode
32452 character character
No columns have null values.
Question 1: How did congressional voting trends change from 2016-2020
Introduction
<<<<<<< HEAD The goal of Q1 is to understand how voting trends in the House of Representatives change over the course of a presidency. To answer this question, district majority (the number of districts a party won), and party vote majority (the number of votes a party received overall), will be visualized. The data will be narrowed down to the district, party, candidate votes and state for the years 2016, 2018, and 2020. The interest in this question stems from the recent volatility in the US political climate and to understand how voter sentiment changed in the last presidential cycle. Data from the House of Representatives is used rather than general election data because it gives insight into voter sentiment every two years and is more of a direct reflection of voters’ beliefs since there are more candidates that can be elected. ======= The goal of Q1 is to understand how voting trends in the House of Representatives change over the course of a presidency. To answer this question, district majority (the number of districts a party won), and party vote majority (the number of votes a party received overall), will be visualized. The data will be narrowed down to the district, party, candidate votes and state for the years 2016, 2018, and 2020 >>>>>>> 0bc1ada5a992329123c3b3bf477ea0ca0d7407d4
Approach
Election data from 2016, 2018, and 2020 were extracted from the data set. To quantify party majority metrics in a more simplified manner, all parties that are outside of Republican and Democrat were grouped into a new generalized party called “Other”. For each year of interest, a map of the USA is plotted with all 50 states, where the color of the fill and the color of the label of each state represent different voting metrics. The fill of each state is based on which party has the most districts, which is called “House Majority”. The color of the label of each state is based on which party has the most votes overall, called “Popular Vote Majority”. The three maps will then be set in a tabbed format, which will allow years to be toggled between. Maps are useful to analyze this question because they show state-wise data in an accessible way. The colors are then mapped to each party’s traditional color, which makes for an easy way to correlate the colors to a party. h
The t
2016, 2018, and 2020. 2016 and 2020 represent presidential election years, which occur every four years. A map of the USA is plotted with all states, where the color of the fill and the color of the label of each state represent different voting metrics. The fill of each state is based on which party has the most districts, which is called “House Majority”. The color of the label of each state is based on which party has the most votes overall, called “Popular Vote Majority”.
election data from 2016, 2018 and 2020 was analyzed to determine the party that has the majority in each state based on the house of representatives. We calculate the party with majority of seats in the house for each state in each election year. It shows that party that received the most total votes in each state for each year. Which tells us the most total votes in each state for each year. The analysis provides insights into the party distribution and voting trends across states in the specified election years.
Analysis
<<<<<<< HEAD The years analyzed for Q1 are 2016, 2018, and 2020. 2016 and 2020 represent presidential election years, which occur every four years. A map of the USA is plotted with all states, where the color of the fill and the color of the label of each state represent different voting metrics. The fill of each state is based on which party has the most districts, which is called “House Majority”. The color of the label of each state is based on which party has the most votes overall, called “Popular Vote Majority”.
0bc1ada5a992329123c3b3bf477ea0ca0d7407d4
The years analyzed for Q1 are 2016, 2018, and 2020. 2016 and 2020, represent presidential election years, which occur every four years. A map of the USA is plotted with all states, where the color of the fill and the color of the label of each state represent different voting metrics. The fill of each state is based on which party has the most districts, which is called “House Majority”. The color of the label of each state is based on which party has the most votes overall, called “Popular Vote Majority”.
We generate a map of united states for the year 2018, Where each state is colored based on the winning party in the House of Representatives. We also included labels for each state indicating the party that received the most votes. The map uses different colors to represent the Republican Party and Democrat Party. We can see out of 50 states in US Democrat Party has won 42 states and Republican Party has won only in 1 state that is NC in 2018.
We generate a map of united states for the year 2020, Where each state is colored based on the wining party in the House of Representatives. We also included labels for each state indicating the party that received the most votes. The map uses different colors to represent the Republican Party and Democrat Party. We can see out of 50 states in US Democrat Party has lost 13 states which is the most states by comparing 2016, 2018 and 2020 election results. Also, Republican party has gained most states according to the data of previous years.
Discussion
The visualization of the House of Representatives broken down by district majority and popular vote majority provides multiple insights. 1) The sentiment of the general population is often overlooked by the electoral (district) system. 2) During a single presidency, the sentiment of the people will change significantly 3) Well known trends can be seen such as midterm elections supporting the opposite party of the incumbent president.
Question 2 : How often did change occur in House representation from the years 2012-2022 in the state of Arizona and which voting methods played a significant role in these elections?
Approach
The analysis will look at election results from 2012, 2016, and 2022 to see how Arizona’s congressional district alignments have changed over time. The goal is to identify patterns of political representation inside the state’s districts, detecting movements in party control over both short-term and decade-long periods. Furthermore, the analysis will look into the influence of various voting procedures, specifically their impact on election outcomes in Arizona.
Feature Extraction and Data wrangling for voting methods
Feature Extraction : Created 2 new Features - Result( Win and Loss ) and Type_of_Voting , these were done by converting all categorical values in numerical and Created Function
Extracting 1st Feature : Type_of_Voting
# A tibble: 32,452 × 21
year state state_po state_fips state_cen state_ic office district stage
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
2 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
3 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
4 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
5 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
6 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
7 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
8 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
9 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
10 1976 ALABAMA AL 1 63 41 US HOUSE 4 GEN
# ℹ 32,442 more rows
# ℹ 12 more variables: runoff <dbl>, special <dbl>, candidate <chr>,
# party <chr>, writein <dbl>, mode <chr>, candidatevotes <dbl>,
# totalvotes <dbl>, unofficial <dbl>, version <dbl>, fusion_ticket <dbl>,
# State_Population <chr>
# A tibble: 32,452 × 22
year state state_po state_fips state_cen state_ic office district stage
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
2 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
3 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
4 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
5 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
6 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
7 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
8 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
9 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
10 1976 ALABAMA AL 1 63 41 US HOUSE 4 GEN
# ℹ 32,442 more rows
# ℹ 13 more variables: runoff <dbl>, special <dbl>, candidate <chr>,
# party <chr>, writein <dbl>, mode <chr>, candidatevotes <dbl>,
# totalvotes <dbl>, unofficial <dbl>, version <dbl>, fusion_ticket <dbl>,
# State_Population <chr>, Type_of_Voting <chr>
A “Type_of_Voting” column has been added to the dataset which categorizes entries as normal elections, fusion tickets, runoffs, special elections, write-ins, and unofficial results. This feature aids the analysis by emphasizing the multitude and distinctiveness of the election processes reflected in the data.
Extracting 2nd Feature : Result( Win/Loss)
spc_tbl_ [32,452 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ year : num [1:32452] 1976 1976 1976 1976 1976 ...
$ state : chr [1:32452] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
$ state_po : chr [1:32452] "AL" "AL" "AL" "AL" ...
$ state_fips : num [1:32452] 1 1 1 1 1 1 1 1 1 1 ...
$ state_cen : num [1:32452] 63 63 63 63 63 63 63 63 63 63 ...
$ state_ic : num [1:32452] 41 41 41 41 41 41 41 41 41 41 ...
$ office : chr [1:32452] "US HOUSE" "US HOUSE" "US HOUSE" "US HOUSE" ...
$ district : num [1:32452] 1 1 1 2 2 2 3 3 3 4 ...
$ stage : chr [1:32452] "GEN" "GEN" "GEN" "GEN" ...
$ runoff : num [1:32452] 0 0 0 0 0 0 0 0 0 0 ...
$ special : num [1:32452] 0 0 0 0 0 0 0 0 0 0 ...
$ candidate : chr [1:32452] "BILL DAVENPORT" "JACK EDWARDS" "WRITEIN" "J CAROLE KEAHEY" ...
$ party : chr [1:32452] "DEMOCRAT" "REPUBLICAN" "WRITE-IN (INDEPENDENT)" "DEMOCRAT" ...
$ writein : num [1:32452] 0 0 1 0 0 1 0 0 1 0 ...
$ mode : chr [1:32452] "TOTAL" "TOTAL" "TOTAL" "TOTAL" ...
$ candidatevotes : num [1:32452] 58906 98257 7 66288 90069 ...
$ totalvotes : num [1:32452] 157170 157170 157170 156362 156362 ...
$ unofficial : num [1:32452] 0 0 0 0 0 0 0 0 0 0 ...
$ version : num [1:32452] 20230706 20230706 20230706 20230706 20230706 ...
$ fusion_ticket : num [1:32452] 0 0 0 0 0 0 0 0 0 0 ...
$ State_Population: chr [1:32452] "#N/A" "#N/A" "#N/A" "#N/A" ...
$ Type_of_Voting : chr [1:32452] "Normal" "Normal" "writein" "Normal" ...
- attr(*, "spec")=
.. cols(
.. year = col_double(),
.. state = col_character(),
.. state_po = col_character(),
.. state_fips = col_double(),
.. state_cen = col_double(),
.. state_ic = col_double(),
.. office = col_character(),
.. district = col_double(),
.. stage = col_character(),
.. runoff = col_double(),
.. special = col_double(),
.. candidate = col_character(),
.. party = col_character(),
.. writein = col_double(),
.. mode = col_character(),
.. candidatevotes = col_double(),
.. totalvotes = col_double(),
.. unofficial = col_double(),
.. version = col_double(),
.. fusion_ticket = col_double(),
.. State_Population = col_character()
.. )
- attr(*, "problems")=<externalptr>
# A tibble: 32,452 × 23
year state state_po state_fips state_cen state_ic office district stage
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
2 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
3 1976 ALABAMA AL 1 63 41 US HOUSE 1 GEN
4 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
5 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
6 1976 ALABAMA AL 1 63 41 US HOUSE 2 GEN
7 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
8 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
9 1976 ALABAMA AL 1 63 41 US HOUSE 3 GEN
10 1976 ALABAMA AL 1 63 41 US HOUSE 4 GEN
# ℹ 32,442 more rows
# ℹ 14 more variables: runoff <dbl>, special <dbl>, candidate <chr>,
# party <chr>, writein <dbl>, mode <chr>, candidatevotes <dbl>,
# totalvotes <dbl>, unofficial <dbl>, version <dbl>, fusion_ticket <dbl>,
# State_Population <chr>, Type_of_Voting <chr>, RESULT <chr>
A new column titled RESULT is added to the datset, labelling items as “Win” if they have the most votes according to the mentioned criteria, and “Loss” if they do not fulfil these standards. This efficiently separates winning and losing candidates based on vote count, contributing useful categorization to the dataset.
Data Wrangling :
Analyzing and filtering Arizona’s vote data from 2012 to 2022, brings emphasis on the various methods of voting and how it effects the outcomes of candidates. By focusing on particular details—year, voting type, and election results—it determines both the frequency and percentage of each vote result. This data allows to visualize the Arizona district electoral evolution over that decade.
### 2012
The resultant visualisation is a color-coded map of Arizona’s congressional districts in 2012, based on the political party that won each seat. Each district is assigned a colour that represents the winning party: blue for Democrats and red for Republicans. The representation offers a visual picture of Arizona’s political distribution in 2012, highlighting regions of Democratic and Republican strength and providing insights into regional political processes. This visualisation reveals the political leaning of most of the congressional districts in Arizona during that election cycle was towards the Democratic party.
The resultant visualisation is a color-coded map of Arizona’s congressional districts in 2016, based on the political party that won each seat. Each district is assigned a colour that represents the winning party: blue for Democrats and red for Republicans. One key feature of this visualisation is the use of transparency to highlight shifts in political representations. Districts with a party change are displayed with a lower opacity, distinguishing them from those with no change, which keep full-colour saturation. This visualisation reveals that the political leaning of 6th congressional district during the 2016 election cycle has changed from Democratic party to Republican party.
The resultant visualisation is a color-coded map of Arizona’s congressional districts in 2022, based on the political party that won each seat. Each district is assigned a colour that represents the winning party: blue for Democrats and red for Republicans. One key feature of this visualisation is the use of transparency to highlight shifts in political representations. Districts with a party change are displayed with a lower opacity, distinguishing them from those with no change, which keep full-colour saturation. This visualisation reveals that the political leaning of 1st and 9th districts has changed from Democratic party to Republican party.4th congressional district has changed from Republican party to Democratic party.
Discussion
An interesting trend emerges from examining Arizona’s congressional district maps: the political setting rarely changes over brief periods, such as between two elections that happened just a few years apart(2012 and 2016). This consistency points strong party loyalty at these times. But a decade of observation tells a different story: there are notable changes in party representation in different districts. This discovery highlights a slow but significant shift in voter attitude and party affiliations. Even though it is rare for districts to switch parties in the course of one or two election cycles, a ten-year period shows a dynamic and ever-changing political landscape. This long-term perspective on the ups and downs of political change allows for a better understanding of the complexity and gradual alterations in voter preferences and party power.
Discussion Q.2 Part 2
The visual representation of the data offers a detailed description of election outcomes by voting type from 2012 to 2022, distinguishing between victories and defeats through color-coded bars and labeled percentages. The analysis of Arizona’s electoral data over the past decade reveals varying success rates for conventional voting methods, marked by a significant decline in 2020 followed by a notable recovery in 2022. In contrast, candidates utilizing write-in approaches consistently faced electoral setbacks year after year. This gap implies that traditional voting procedures, despite their continuous dominance, are not effective entirely, while also emphasizing the ongoing difficulty that write-in candidates have in gaining political success in Arizona. These visual representations showcase shifts in electoral outcomes across different voting methodologies over time, providing valuable insights into the evolving political dynamics and voter preferences within the state.